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ABSTRACT 

The student placement test in English as a Second 
Language used for the Eurocentres schools in the United Kingdom 
differs from traditional placement tests in that it is a vocabulary 
test and does not attempt to measure other aspects of learner 
knowledge of English. The test attempts to measure absolute size of 
the learner's English vocabulary. The totally automated tert is 
administered by displaying a large number of words on a computer 
screen, asking the learner which he knows, and using mathematical 
formulas to estimate vocabulary size. The examinee proceeds through a 
series of such screens. An example with French vocabulary illustrates 
the method. The mathematical model used for calculating vocabulary 
size evolved from military research. Results of administration of the 
test to about 250 students from a wide range of language backgrounds 
were compared with results of the previously-used placement test at 
the Eurocentres schools found relatively high correlations between 
the tests but some variation by language group. It is concluded that 
the test works well for placement but needs further refinement, and 
that visual recognition of words may not accurately reflect knowledge 
of them. Use of imaginary words in the test should also be 
reconsidered. (MSE) 
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VOCABULARY SIZE AS A PLACEMENT INDICATOR 
Paul Meara 

Birkbeck College, London Univeisity 
and 

Glyn Jones 
Burocenties 

Background 

This paper describes a placement test which was developed for the Euroc^ntres 
Group during 1986-87. The Eurocentres schools, like many other private sec- 
tor language schools in the UK work on a cycle of short courses each lasting for 
four weeks. This means that every four weeks there is a huge tum-over of stu- 
dents, and a large number of new students have to be assessed and assigned to 
classes of an appropriate level. In most schools, this assessment is done by 
means of a complex battery of tests specially designed for this purpose, and 
generally referred to as placement tests. The tests currently used by Eurocen- 
tres, the Joint Entrance Test (JET), are fairiy typical of this sort of test; they 
comprise a listening comprehension test, a grammar test and a reading test, sup- 
plemented an oral interview. 

The main problem with tests of this sort is that they take a long time to ad- 
minister and mark. In a situation where time is at a premium because classes 
cannot be started until the placement procedure is completed this is obviously 
a serious shortcoming. 

The tests that we have devised differ radically from traditional placement 
tests. They are very quick to administer (typically they need only 10-15 minutes 
to complete) and because the whole test is run by a small micro-computer the 
test is self-scoring and does not need to be checked by a teacher. This represents 
a large saving in teacher-hours, and greatly simplifies the placement procedure. 
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The Test 

The test we devised for Eurocentres is very different from a traditional place- 
ment test, in that it is basically a vocabulary test, and does not attempt to measure 
other aspects of the learaer*s knowledge of English. The justification for this 
approach is that there is a large body of evidence (for English as an L 1 ) that vo- 
cabulary knowledge is heavily implicated in all practical language skills, and 
that in general, speakers with a large vocabulary perform better on a wide range 
of linguistic indicators than speakers with a more limited vocabulary (Ander- 
son and Freebody 1981). 

However, our test is not just a traditional vocabulary test of the type famil- 
iar from Cambridge Proficiency examinations. Instead of testing a small num- 
ber of vocabulary items with complicated multiple-choice type tests, our test is 
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Vocabulary Size as Placement Indicator 

an attempt to measure the absolute size of a leamer*s vocabulary in English. We 
do this by simply displaying a large number of English words on the computer 
screen and asking the testee to decide whether he knows each of the words dis- 
played or not. The computer program then uses some sophisticated mathemati- 
cal techniques to estimate the testee *s actual vocabulary size. The principal 
advantage of this methodology is that the test is totally automated. It takes less 
than 10 minutes to run, and scores itself without any manual intervention. 

It is obviously not possible to demonstrate this technique in printed format, 
but you will get a rough idea of how the test works if you try the test in Table 1 
before you go any further. 

Table 1 

Look through the French words listed below. Cross out words that you do not 
know well enough to say what they mean. Keep a record of how long it takes 
you to do the test. 



VIVANT 


TROUVER 


MAGIR 


ROMP ANT 


MELANGE 


LIVRER 


IVRE 


FOMBE 


MOUP 


VION 


LAGUE 


INONDATION 


SOUTE^fIF. 


SIECLE 


TORVEAU 


PRETRE 


REPOS 


GANAL 


HARTON 


TOULE 


GOUTER 


FOULARD 


EXIGER 


AVARE 


ETOULAGE 


ECARTER 


MIGNKi'iH 


JAMBONNANT 


DEMENAGER 


POIGNEE 


EQUIPE 


MISSONNEUR 


AJURER 


BARRON 


CLAGE 


TOUTEFOIS 


LEUSSE 


CRUYER 


HESITER 


SURPRENDRE 


LAVIRE 


SID 


ROMAN 


CHIC 


ORNIR 


CERISE 


PAPIMENT 


CONFITURE 


GOTER 


PONTE 







The test in Table 1 presents you with a list of French words and asks you to say 
which of these words you know. The words are actually a sample of words from 
the deuxieme degri of Francais Fondamental, which comprises a total of ap- 
proximately 2000 high frequency French words, and if you have studied school 
French even to an elementary level, you should have been able to recognise at 
least some of these .words. The test in Table 1 actually contains two types of 
item: real words (which you might have recognised) and imaginary, non-exist- 
ent words (which you cannot possibly have recognised). This combination of 
real and imaginary words gives us four combinations of items and answers: 
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type of item 


real 


imaginary 


response YES 


RY 


lY 


response NO 


RN 


IN 



Now suppose that you identified all the real words, and rejected all the imagin- 
ary words in the test. In this case we would want to say that you reliably recog- 
nised tiie real words, and. because these words are a sample from a set of 2000 
words, we would probably want to say that you would be able to recognise re- 
liably all 2000 words in the set. 

Suppose, on the other hand, that you identified half the real words and re- 
jected all the imaginary ones. In this case, we would want to say that you could 
probably recognise 50% of the 2000 word set, i.e. about 1000 words. 

More interesting cases aiise where people produce YES responses to im- 
aginary words. Suppose, for example, you recognised all the real words, but also 
claimed to recognise half the imaginary words. In this case, we would want to 
argue that your score of 100% on the real words is too high; it needs to be re- 
duced because your threshold for saying that you recognise a word is too low. 
The size of the adjustment depends on the number of lY responses you make - 
obviously, if you make lots of lYs, then your acceptance threshold is very low 
and you are likely to produce RY responses by chance. 

Tl.e mathematics of all this is not too difficult. In the 1950s, the Navy car- 
ried out a great deal of research on how well ASDIC operators could identify 
enemy submarines. They were interested in three types of behaviour times 
when an operator correctly identified a submarit^ that was actually there; times 
when an operator failed to identify a submarine that was actually there; and 
times when an operator identified a submarine that didn't actually exist. You 
will see that there is an obvious parallel between these three situations and the 
RY lY and RN responses described above; all that is necessary is to replace 
"submarines" by "French words". The mathematical model devised to handle 
the submarine situation (signal detection theory) should also apply to our voca- 
bulary recognition task. 

The test which we devised for Eurocentres uses this basic principle, but is 
rather more complipaled tlian the test ouUined above. A schematic version of 
our lest is shown in Figure 1. Basically, our test is divided up into a number of 
levels, each corresponding to a frequency band of 1000 words. The first part of 
the test starts off at the highest frequency band, and assesses how many of these 
words a tcstee can be deemed to know by sampling 10 real words and 10 im- 
aginary words. If the testee scores highly on this band, s/he is tested on the next 
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Vocabulary Size as Pli\cement Indicator 
Figure 1 The structure of the test files 

Coaise files (Cl-ClO) contain 10 items from the bottom end of a specified fre- 
quency band. The testee moves through the coarse files in turn, until her perfor- 
mance is too poor to allow her to continue, or until she successfully completes 
the final file CIO. 

Fine files (FUFIO) contain 50 items fi:om the specified frequency range, and 
thus allow us to test explicitly one word in twent>'. Once the testee finishes a 
fine file, her total vocabulary score is calculated. 
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bandt and this process continues until performance drops below a preset thre- 
shold. At this point, the test works out a rough estimate of how many words we 
think the testee knows, and tests a further fifty words from the appropriate fre- 
quency band. So, suppose our testee scores 100% on bands l-4> but only 20% 
on band 5, the program reckons that the testee knows somewhere betwee 4 and 
5 thousand words, and does its detailed testing on band 5. The detailed testing 
phase aaually tests one word in twenty at the appropriate level. 

Assessment 

So far we have run three versions of this test with about 250 students from a 
wide range of language backgrounds, 109 at the Cambridge Eurocentre School, 
and two groups totalling 158 in London. For practical reasons, we have mainly 
been interested in correlating the results of our test with the results of the Eu- 
rocentre s JET test - i.e. we are interested in establishing how far our Vocabu- 
lary test can be used as an alternative to JET. The results of this work are 
summarised in Table 2. 



Table 2 

Correlations between the Vocabulary Test and JET 



1 : CAMBRIDGE 


109 testees 


overall correlation 


.664 






subgroups: French 


.549 






German 


.807 




adjustments: 


4 out of 5 




1: LONDON 


159 testees 


overall correlation 


.717 






subgroups: French 


.556 






Italian 


.792 






Spanish 


.723 






Portuguese 


.756 






German 


.790 






Non-IE 


.735 




adjustments: 


9 out of 14 





There are a number of interesting points to note here. Firstly, the correlations 
between JET and VOC (the vocabulary test) are generally high: in fact, given 
the diverse nature of the tests, the results are surprisingly high. Obviously, the 
correlations are not perfect, but given that JET is itself unsatisfactory in some 
ways, this is only to be expected. More interesting is the fact that the correla- 
tions vary slightly for different language groups. In general, correlations for ho- 
mogeneous language groups are better than correlations for mixed groups, and 
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some linguistic groups produce veiy high levels of correlation indeed. This is 
not always the case, however. With the French sneakers studied here, the corre- 
lations between VOC and JET are consistently low. At the moment, we don't 
really know how to interpret these differences. One possible explanation is that 
the VOC test in its present format is systematically biassed against speakers of 
particular languages, but it is equally possible that the VOC test is accurate, and 
that the JET test is biassed in the same way. Some evidence for this latter view 
comes from another study (Meara and Buxton 1988) in which very high levels 
of correlation between a VOC lest and a more traditional M/C test were found 
with French speakers. 

A further check on the effectiveness of the VOC test as a placement indica- 
tor comes from adjustments made to class registers one week after the original 
placement by JET. In the Cambridge sttidy (109 Ss) five students were reallo- 
cated to a different group on the basis of their actual performance in class. Four 
of these cases were moved up to a higher level than their original placement, 
and in every case this move was in line with the placements produced by VOC. 
In the London trials (! 59 Ss), a questionaire was used to assess major discrep- 
ancies in the placements produced by JET. Tbia trawl produced 14 cases; in nine 
of these cases, teachers* assessments agreed with the VOC score rather than 
with the JET score. Not surprisingly, if these cases are excluded from the data, 
the overall correlation between JET and VOC increases. 

Conclusion 

This paper has described a relatively small-scale study which uses a measure of 
vocabulary size as a way of placing students at the start of their course. The data 
that we have presented suggests that the test works well, though obviously a 
great deal more work will be needed before we can claim it is thoroughly re- 
liable. The test in its present format, for example, is basically a test of visual 
familiarity, aixl h assumes that recognition of a word form is an adequate test 
of word knowledge. This assumption is clearly one thai needs to be probed care- 
fully. Obviously, formal recognition is necessary but not sufficient for word 
knowledge, but by relying on recognition, the test probably over-estimates true 
vocabulary knowledge. Whether this really matters or not is anybody's guess: 
it could be, for example, that passive recognition vocabulary is generally close- 
ly related to the size of a learner's active vocabulary, and that a more accurate 
estimate of vocabulary size could be obtained by suitably adjusting the raw 
scores found on the VOC lest. Anotlier problem arises from the use of imagin- 
ary words. The current version of the test uses imaginary words which are very 
carefully constructed so that they share the physical characteristics of the real 
words in the same set. However, it is clear to us that some of the imaginary 
words are <^asier to handle than others: some can be rejected instantaneously, 
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while others cause even native speaker's of English to puzzle for a long time. 
We also think that some imaginary words cause difficulty to speakers from par- 
ticular language backgrounds. Again, we don't know why this should be, but 
the problem is one that can easily be solved by further work. 

At the moment, then, the best we can say is that the work we have done looks 
very promising, and if further developments live up to these promises, then it 
looks as though the tedious and time-consuming task of placing students at the 
start of a course could be greatly simphfied and stream -lined. A small contribu- 
tion to ^'applied linguistics in society" perhaps, but one that will be welcomed 
by many teachers. 

However, the VOC test has other advantages, besides these practical ones. 
OttL major advantage from the research point of view is the speed with which 
the VOC test can be administered. Since it only takes ten minutes, there is no 
reason why it should not become a standard tool for assessing subjects in em- 
pirical research. At the moment, the research literature uses only vague labels 
for describing people who lake part in research: "50 first certificate students'', 
"25 students following a pre- university course at Stanford", or "(50 air-force 
pilots" arc typical examples of this sort of labelling. Cleariy they are not very 
informative; it would be much more helpful to be told that we were dealing with, 
say 1 50 air-force pilots who scored a mean of 4500 on the VOC lest with a stand- 
ard deviation of 50 words. The fact that the VOC lest is so quick to administer 
makes this kind of standardisation a real possibility. 

The VOC test is also interesting because it opens up areas of research which 
have not been accessible before. If the VOC test really does measure vocabu- 
lary size, then we can begin to ask questions like these: 

how fast do people learn new words? 

how much individual variation is there in this skill? 

is it affected by other variables, such as LI , or LI vocabulary size? 

how effective are different types of leaching program? e.g. do intensive 

courses produce more vocabulary learning than less intensive eight-week 

ones? 

how quickly do learners who don't practice lose their vocabulary? 

is the fall- out rale such that it caches a stable assymptote? i.e. 

is there a residue of words that you never really forget, no matter how litUe 

you practice? 

These are questions, that we hope to address in the near future. 

To sum up, then, the VOC started out as a practical research problem aimed 
at providing a solution to an organisational problem. In R and D circles, it is 
common to hear people talking about the practical spin-offs from tiieoretical re- 
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search: the VOC test seems to be a clear case of theoretical spin-offs from the 
practical research. Maybe the real future of Applied Linguistics lies down this 
road? 
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